home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Netware Super Library
/
Netware Super Library.iso
/
nov_info
/
fyi8
/
080115.dos
< prev
next >
Wrap
Text File
|
1990-08-01
|
8KB
|
138 lines
Subject: Novell E-Net Shell Loading Problem (OS Problem)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Technical Description and Resolution
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!! NOTE: This problem has been fixed in later releases !!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
NOTE: This problem is actually an operating system bug. The
probability that this problem will ever be encountered in the field
is extremely low. We encountered the problem only because we were
extensively testing every possible configuration. It is somewhat
unlikely that the configuration on which we found the problem would
actually be used by NetWare users.
PROBLEM: Under specific circumstances, the Novell E-Net shell does not
properly load about 15% of the time. This occurs only under
the following conditions. The file server is an IBM PC XT
running NetWare 86 v2.0a with a Novell E-Net interface card.
The network consists of at least 3 workstations, all with
Novell E-Net interface cards. All but one of the
workstations are logged in and are running disk intensive
programs. The last workstation then attempts to load the
shell, but about 15% of the time it gets the following error
message before getting fully connected to the file server:
Network Error on Server SERVERNAME:Error reading from
network. Abort or Retry?
Attempting to retry only causes the error to be redisplayed.
The workstation must then be rebooted before attempting to
reload the shell. Once the shell is successfully loaded, the
system will run indefinitely without any errors.
CAUSE: This error is caused by the way that NetWare buffers incoming
packets, coupled with the very high network speed of E-Net
and the extreme slowness of the IBM PC XT hard disk.
The problem occurs when the workstation that is attempting
to load the shell gets out of synchronization with the file
server. This occurs just as the shell is attempting to
request initial service from the server. This is the most
critical point of communication between the workstation and
the file server. From then on until the workstation is
rebooted, communication between the workstation and the file
server is essentially deadlocked. The workstation keeps
sending requests to the server, but the server keeps ignoring
them because they are in the wrong sequence from that which
it is expecting.
This error condition has nothing to do with the Novell E-Net
boards except that the condition was aggravated by the
network's high speed. Potentially, this error could occur
with any very high speed network running on a file server
that has an extremely slow hard disk.
TECHNICAL
EXPLANATION: When the shell is first loaded on a workstation, it sends an
"Allocate Slot Request Packet" to the file server requesting
it to open a slot for the workstation. The handling of this
packet is critical because several parameters are initialized
at this time. If the file server is doing considerable
processing when receive packets arrive, they are placed in
a LIFO buffer known as the Turbo Receive Buffer. The
"Allocate Slot Request Packet" is placed in the buffer along
with other packets until the operating system gets around to
processing them. If the file server is extremely busy, the
workstation shell times-out. Thinking that the request packet
may have gotten lost, it sends a retry "Allocate Slot Request
Packet". This retry packet is also received and stored in the
Turbo Receive Buffers.
Now the operating system finally completes its other tasks and
starts to process the incoming packets. Because the buffer
is a LIFO, the retry "Allocate Slot Request Packet" is
processed first. The slot parameters are initialized
including the packet sequence number. The packet sequence
number is the number of the next packet in the sequence of
communications between the file server and that specific
workstation. A reply is generated and sent back to the
workstation incrementing the packet sequence number stored in
the file server. The workstation then sends its next request
packet again incrementing the packet sequence number. The
file server buffers the incoming packet and eventually
processes it and sends back a reply packet. Again the packet
sequence number is incremented.
Finally, the file server gets around to processing the
original "Allocate Slot Request Packet" that had been buried
in the bottom of the stack. This packet causes the file
server to reinitialize the slot parameters for that particular
workstation including the packet sequence number. The file
server then sends a reply for this packet out which is ignored
by the workstation because it is the wrong sequence number.
Now the file server will no longer accept and process packets
from the workstation because the sequence numbers are out of
synchronization. The workstation is attempting to send valid
packets with valid sequence numbers to the file server but
since the file server's sequence number counter has been
reinitialized, none of these packets are recognized and they
are discarded. For example, the file server is expecting
packet number two (since the sequence number was
reinitialized) but the workstation is attempting to send
packet number four or higher. The workstation can retry
sending the new packets to the file server forever and the
server will never process them. Thus the communication
between the file server and the workstation is deadlocked
until the workstation is rebooted and the shell sends a new
"Allocate Slot Request Packet."
If the shell successfully loads and connects to the file
server it means that the "Allocate Slot Request Packet" has
been serviced properly without duplication. The system then
will continue to correctly operate indefinitely. The packets
may get processed out of order because of the LIFO reordering.
However, this has no effect on the processing because the
sequence numbers are still synchronized between the file
server and the workstation.
SOLUTION: Since the problem described above is directly linked to the
operating system, the best way to eliminate the problem would
be to modify the operating system code. This could be an
update consideration in future releases of NetWare.
Since the error is noncritical and recoverable, no immediate
solution is being sought. This decision is based upon the
following facts. The error occurs only under a very specific
set of rare circumstances. The error only occurs about 15%
of the time under these circumstances and it is easily
recovered from by rebooting and reloading the shell. Once the
shell is successfully loaded, no further problems are
experienced.
TIC: date=3-30-87, ref#=031887.008, status=RESOLVED